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Abstract 

We consider the problem of estimating the mixing density / from n i.i.d. observations distributed 
according to a mixture density with unknown mixing distribution. In contrast with finite mixtures models, 
here the distribution of the hidden variable is not bounded to a finite set but is spread out over a given 
interval. An orthogonal series estimator of the mixing density / is obtained as follows. An orthonormal 
sequence (V'fc)fe is constructed using Legendre polynomials. Then a standard projection estimator is 
denned, that is, the first m coefficients of / in this basis are unbiasedly estimated from the observations. 
The construction of the orthonormal sequence varies from one mixture model to another. Minimax upper 
and lower bounds of the mean integrated squared error are provided which apply in various contexts. In 
the specific case of exponential mixtures, it is shown that there exists a constant A > such that, for 
m ~ Alog(ra), the orthogonal series estimator achieves the minimax rate in a collection of specific 
smoothness classes, hence, is adaptive over this collection. Other cases are investigated such as Gamma 
shape mixtures and scale mixtures of compactly supported densities including Beta mixtures. 

1. Mixture Distributions 



We consider mixture distributions of densities belonging to some parametric collection {ir t ,t £ ©} 
of densities with respect to the dominating measure ( on the observation space (X, X). A general repre- 
sentation of a mixture density uses the so-called mixing distribution and is of the following form 

(1) 



t/0*0 = / /(t)Trt(xMdt) , 
Je 

where the mixing density f is a density with respect to some measure fi defined on G. If /i is a counting 
measure with a finite number of support points 6^, then obviously, -kj is a finite mixture distribution of the 
form Ylk=i Vk^e k - However, if \x denotes the Lebesgue measure on 0, and if is a given interval, say 
G = [a, b], then the distribution of the latent variable t is spread out over this interval and 7rj represents 
a continuous mixture. In this paper we consider continuous mixtures and the problem of identifying the 
mixing density / when observations are available only from the continuous mixture 717. 

The literature provides various approaches to this estimation problem, as for example the nonparamet- 
ric maximum likelihood esti mate (NPMLE). A character istic feature of this estimator is that it yields a 
discrete mixing distribution ( Lairdl. 1978; iLindsayl. 119831) . This appears to be unsatisfactory if we have 
reasons to believe that the mixing density is indeed a smooth function. In this case a functi onal approach 
is more appropriate, which relies on smoothness assumptions on the mixing de nsity f InlZhand dl990h 
kernel estimators are constructed for mixing densities of a location parameter. iGoutisI (119971) proposes 
an iterative estimation procedure also based on kernel methods. For mixtures of discrete distributions, 
that is when ir t are densities with respect to a counting measure on a discrete space, orthogo nal series 
estimators have been developed and studied in lHengartnen (11997b and lRoueff & Rydenl (120051) . For such 
mixtures, these estimators tur n out to enjoy similar or better rates of convergence than the kernel estima- 
tor presented in IZhand dl995l) . In this paper we show that orthogonal series estimators can also be used 
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in various cases where {-7r t ,t G 9} is a scale family, that is -K t = i _1 7Ti(i _1 •), hence in cases where 
the mixand is not a discrete distribution (£ is the Lebesgue measure). In particular, if tt\{x) = e~ x , 
which is called the exponential mixture case, we exhibit an orthogonal series estimator achieving the 
minimax rate of convergence in a collection of smoothness classes without requiring a prior knowledge 
of the smoothness index. In other words, we provide an adaptive estimator of the mixing density of an 
exponential mixture. 

Natural sciences phenomena of discharge or disexcitation are commonly modeled by exponential 
decays as e.g. radioactive decays, the electric discharge of a capacitor or the temperature difference be- 
tween two objects. Several exampl es of applicati ons of the exponential mixture model can be found in 



the references of the seminal paper iJewelll (119821) . Let us however briefly described a specific applica- 



tion where exponential decays are encountered and that motivated our study, namely the time-resolved 
fluorescence. Fluorescence is the emission of a photon by an excited molecule. The duration between 
the e xcitation of the molecule and th e emission of the fluorescence photon is called the fluorescence life- 
time ( Lakowiczl 1999 : Valeur . 2002 ). It is well known that these lifetimes have exponential distribution, 



whose exponential parameter depends first of all on the emitting molecule, and second on the molecule's 
chemical and physical microenvironment, as pH, temperature, viscosity, polarity. Slight differences in 
the microenvironment of two molecules of same type may yield different exponential parameters. Thus, 
the distribution of fluorescence lifetimes of a large number of molecules is a mixture of a large number 
of exponential distributions, which may be best modeled by a continuous exponential mixture. As the 
distribution of the exponential parameters is a precious source of information on molecular processes in 
various applications in biology, chemistry or medicine, the estimation of the mixing density has a major 



interest in fluorescence (W are et a/.LI1973h . 



The paper is organized as follows. In Section [2] an estimator based on a sequence of orthonormal 
functions (ipk)k is constructed using the first m coefficients c& of the expansion / = Ckipk- In 
Section[3]we derive upper bounds on the rate of convergence of the mean integrated squared error of the 
orthogonal series estimator on some specific smoothness classes. In Section |4]the approximation classes 
used for the convergence rate are related to more meaningful smoothness classes defined by weighted 
moduli of smoothness. Section [5] is concerned with the investigation of the minimax rate. On the one 
hand, a general lower bound of the MISE is provided and on the other hand, some specific cases are 
studied in detail. The Appendix provides some technical results. 



2. Estimation Method 

In this section we develop an orthogonal series estimator and we provide several examples, namely 
for mixtures of exponential, Gamma, Beta and uniform densities. 

2.1. Orthogonal Series Estimator. Throughout this paper the following assumption will be used. 

Assumption 1. Let ( be a dominating measure on the observation space (X, X ). Let {-7r t , t € 6} be a 
parametric collection of densities with respect to £. Furthermore, let the parameter space © = [a, b] be a 
compact interval with known endpoints a < b in R. We denote by X, X\, . . . , X n an i.i.d. sample from 
the mixture distribution density iif defined by CD with p, equal to the Lebesgue measure on [a,b]. 

For convenience, we also denote by 7r t and ttj the probability measures associated to these densities. 
Moreover we will use the functional analysis notation ir t (h) and iTf(h), for the integral of h with respect 
to these probability measures. 

The basic assumption of our estimation approach is that the mixing density / in (fl]) is square inte- 
grate, that is / E L 2 [a,b]. Then, for any complete orthonormal basis (ipk)k>i of the Hilbert space 
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H = L 2 [a, b], the mixing density / can be represented by the orthogonal series f(t) = J2k>i CfeV'fc^)' 
where the coefficients cu correspond to the inner products of / and tp^. If we have estimators c n ^ of 
those coefficients, then an estimator of the mixing density / is obtained by Y^k=i ^n,k^k- 

To construct estimators c n ^, we remark that the following relation holds: Let g be a nonnegative 
integrable function on E. Define the function ip on [a, b] by the conditional expectations 



(2) <p(t) = TT t (g) = / g(x)7r t (x) C(dx) , te[a,b}. 

Jx£X 

Suppose that ip belongs to H. The mean iTf{g) can be written as the inner product of / and ip. Namely, 
by the definition of iif in (Q} and Fubini's theorem, 

Kf{g)= j g{x)-K f {x) Q{dx) = I f(t) / g(x)ir t (x) ((dx)dt = (f,ip) a . 

Consequently, by the strong law of large numbers, - is a consistent estimator of the inner 

product (/, ip)fi based on an i.i.d. sample (X\, . . . , X n ) from the mixture density 7rj defined in (fl]). 

We make the following assumption under which the orthogonal series estimator makes sense. 

Assumption 2. Assumption [T] holds and there exists a sequence (<?£,• )fc>i of X — ^ R functions such that 
(Vfc)fc>i is a dense sequence of linearly independent functions in H, where (fk(t) = ^t(dk) as in (O. 

We then proceed as follows. Using linear combinations of the c/^'s, a sequence of orthonormal func- 
tions ipi,ip2,--- in H can be constructed, for instance by the Gram-Schmidt procedure. Say that V'fc 
writes as Ylj=i QkjVj with an array (Qk,j)i<j<k of real values that are computed beforehand. Then we 
define estimators of Cf. = (f, ipk)u = Sj=i Qk,j{f, Vj)h by the empirical means 

^ n k 
C n ,k = — ^2 Qk,j9j(Xi) . 
n i=l j=l 

Finally, for any integer m, an estimator of / is given by 

^ m - n m 

(3) f m .n = — Cn.fc^fc = — X/ X/ Qk,jQk,l9j{Xi)ipi , 

71 k=l 71 i=l J',fc,i=l 

with the convention = for all j > k. We refer to / mjn as the orthogonal series estimator or the 
projection estimator of approximation order m. 

Define the subspaces 

(4) V m = span(9?i , . . . , ip m ) , for all rn > 1 . 

By Assumption |2j the sequence (Vm) m is strictly increasing, V rn has dimension m for all m, and U m V^ 
has closure equal to H. By construction the orthogonal series estimator / m n belongs to V m . Conse- 
quently, the best squared error achievable by f m ,n is ||/ — -FV m (/)|lH> where || ■ ■ ■ ||u denotes the norm 
associated to EI and Py m the orthogonal projection on the space V m . Hence once the functions g^ are 
chosen, the definition of the subspaces V m follows and the performance of the estimator will naturally 
depend on how well / can be approximated by functions in V m . It is thus of interest to choose a sequence 
(<7fc)fc>i yielding a meani ngful sequence of approx imation spaces (V m ) m . In the context of scale family 



\9k)k>i yielding a meam nglul sequence ol approx imation spaces {v m ) m . In the context ol scale lamily 
mixtures (but not only, see iRoueff «feRvd"enl(l2005h ). polynomial spaces appear naturally. Indeed, for any 



function g, we have TTt(g) = iri(g(t-)), so that, provided that 7Ti has finite moments, if g is polynomial 
of degree k, so is tp(t) = nt(g). The following assumption slightly extends this choice for the two fol- 
lowing reasons. First, a scale family is not always parameterized by its scale parameter but by its inverse 
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(as for the exponential family). Second, it will appear that the choice of (gk)k>i not only influences the 
approximation class (and thus the bias) but also the variance. It may thus be convenient to allow the g^'s 
not to be polynomials, while still remaining in the context of polynomial approximation. This goal is 
achieved by the following assumption. 

Assumption 3. Assumption [2] holds and there exist two real numbers a' < b' and a linear isometry T 
from H to W = L 2 [a', b'\ such that, for all k > 1, T(p k is a polynomial of degree k — 1. We denote by 
T _1 the inverse isometry. 

To compute the coefficients Q^ j under Assumption [3l one may rely on the well known Legendre 
polynomials which form an orthogonal sequence of polynomials in H' = l?\o! ', b'\. Indeed, by choosing 
g k so that T(p k is the polynomial t k ~ l , as will be illustrated in all the examples below, the constants Q^j 
are the coefficients of the normalized Legendre polynomials X^=i Qk,jV~ l - Let us recall the definition 
of the Legendre polynomials. 

Definition 1 (Legendre polynomials). Let a' < b' be two real numbers and denote n = (a' + b')/2 
and 5 = (b' — a')/2. The Legendre polynomials associated to the interval [a',b'] are denned as the 
polynomials rfc(t) = -^M^ _1 > where the coefficients Rf. i are given by the following recurrence 

relation 

R k +i,i = Rk,l-i + fJ>Rk,i ~ PkRk-i,i , for all k, I > 1 , 

with R 1A = 1 and R k j = for all I > k, ft = 26 and p k = 5 2 {k - l) 2 /(4(fc - l) 2 - 1) for k > 2. The 
obtained sequence {rk)k>i is orthogonal in M' = L 2 ([a', b']) with norms given by ||rfc||H' = sffix ■ ■ ■ Pk- 
Hence, the coefficients of the normalized Legendre polynomials are defined by the relation 

(5) Q k ,i = 7 =M , for all k, I > 1 . 

V Pi ■■■Pk 

2.2. Examples. For illustration we exhibit in this section the orthogonal series estimator in some special 
cases. Some scale mixtures are presented. As an example for a non scale mixture we also consider 
Gamma shape mixtures. 

Example 1 (a). Exponential Mixture. We first consider continuous exponential mixtures as they play a 
meaningful role in physics. That is, we consider irt(x) = te~ tx . For the orthogonal series estimator we 
choose the functions g^{x) = 1 {a; > k — |} for k > 1. By (O, we obtain 

(6) = e-<M)t . 

We claim that the tp^'s can be transformed into polynomials in the space H' = L 2 [e~ b ,e _a ]. Indeed, 
define, for all / G M = L 2 [a, b], 

(7) Tf(t) = f(-\ogt)/Vt, t£[e-\e- a ]. 

Then one has (Tf, Tg)w = (f, g)e, hence T is an isometry from H to W. Moreover Tifk(t) = t k ~ 1 are 
polynomials. Denote by Pk(t) = Qk,jt^~ l the Legendre polynomials in W with coefficients Q^j 

defined by © with a' = e~ b and b' = e~ a . Denote by T" 1 the inverse operator of T given by T~ l h(t) = 
e~*/ 2 /i(e~'). Since T" 1 is a linear isometry, we get that the functions ip^ = T~ 1 p/ C = X^f=i Qk,j<Pj w& 
orthonormal in M. Consequently, an orfhonormal series estimator is given by 

1 rn n ( 1} 

(8) fm,n(t) = - J2 E 1 \ Xi>j ~2 \ QkjQk.ie-^ . 

71 k,j,l=l i=l ' 



NONPARAMETRIC ESTIMATION OF THE MIXING DENSITY 



5 



Example 1(b). Exponential Mixture. The choice of the functions g% is not unique and needs to be done 
with care. For illustration, consider once again exponential mixtures with irt(x) = te~ tx . This time we 
take 

9k{x) = atx k with ctfc = x k ~ 1 TTi(dx)^j = 1/kl and hence <fk(t) = t~ k , k > 1 . 

To relate (p k to polynomials, define the isometry f from H to H = L 2 [1 /&, 1 /a] by f/(t) = ±/(±). We 
have Tipk(t) = t k ~ 1 for all k > 1. Furthermore, denote by T _1 the inverse of T satisfying f- 1 h = 
jh(j). Let pfc(^) = Tlj=iQkj^~ l be the Legendre polynomials in H defined with a' = 1/6 and 
b' = I /a. Since T _1 is an isometry, ip^ = T~ l p k = Ylj=i Qkj'-Pj we orthonormal functions in H and 



the orthonormal series estimator is given by 



n z — ' * — ' v. 

k,j,l=l i=l ^ 

Example 2. Gamma Shape Mixture. Polynomial estimators can be used in the context where the mixed 
parameter is not necessarily a scale parameter. As pointed out earlier, they have fi rst b een used for mix 



tures o n a discrete state space X, such as Poisson mixtures, see iHengartnerl (119971) and iRoueff & Ryden 



(120051) . Let u s consider the G a mma shape mixture model. Parametric Gamma shape mixtures have been 



considered in Venturin i et al. d2008l) . For this model Tr t is the Gamma density with shape parameter t 



and a fixed scale parameter (here set to 1 for simplicity), 



x^ 1 



n{x) = — - e"*, t>0, 

where F denotes the Gamma function. This model has a continuous state space (£ is the Lebesgue mea- 
sure on R_|_) and is not a scale mixture. We shall construct and ip^ = n.{g^) such that Assumption [3] 
holds with T being the identity and <pk(t) = t k ~ l . Consider the following sequence of polynomials, 
Pi(t) = 1, P2(t) = t, pk(t) = t(t + 1) . . . (t + k — 2) for all k > 2. Since (pk)k>\ is a sequence 
of polynomials with degrees k — 1, there are coefficients {ck,i)i<i<k such that t k ~ 1 = ^ Ckjpi(t) for 
k = 1,2, ... . A simple recursive formula for computing (cfcj)i<Kfc is provided in Lemma [6] in the 
Appendix, see Eq. (1431) . Observe that, for any I > 1, 

x' 7r t (x)dx = — =pi{t) . 

Hence, setting gu{x) = 2~2i Ck,ix l ~ l , we obtain 



<Pk{t) = Mdk) = V c k ,lPl{t) = t k ~ 



i 



and thus Assumption [3] holds with T being the identity operator and ipk(t) = t k ^ 1 . Define (Qk,i)k,l as 
the coefficients of Legendre polynomials on H = L 2 ([a, b]), that is as in © with a' = a and b' = b. The 
polynomial estimator defined by ((8]) reads 

j n m j 
fm,n(t) = — ^ Qk,jQk,l ^ Cj,h,Xj H 1 . 

i=l fe,i,i=l h=l 



Example 3. Scale Mixture of Beta Distributions or Uniform Distributions. From a mathematical point 
of view, scale mixtures are interesting as they define classes of densities that verify some monotonicity 
constraints. It is well known that any monotone non-incr easing density function with support in (0, +oo) 



can be written as a mixture of uniform densities U[0, i] (IFelleru 19711. p. 158). Moreover, a k-monotone 



density is defined as a non-increasing, convex density function h whose derivatives satisfy for all j 
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1, . . . , k — 2 that (— iyhv) is non-negative, non-increasing and convex. One can show that any k- 
monotone density can be represented by a scale mixture of Beta distributions 5(1, k). Furthermore, 
densities that are /c-monotone for any k > 1, also ca lled completely monotone fun ctions, can be written 



as a continuous mixture of exponential distributions ( Balabdaoui & Weilneri. 12007 ). 



We now consider k-monotone densities with k > 1 that can be represented by a scale mixture of Beta 
distributions 5(1, k) with 

k / x \ k— i 
7r t (x) = - 1 , for x G [0, t] . 



t V t 

Note that if k = 1, then 7r t is the uniform density U(0, t). We take 

9 P (x) = a v x v ~ x with a p = (J x p_1 7ri (da;) J = - — |- ^ , p>l, 

where f3(a, b) = i° -1 (l - t) 6_1 dt denotes the Beta function. It follows that ip p (t) = t^ 1 . As in 
the preceding example, if / € H then an orthogonal series estimator f m>n of / can be constructed by 
using Legendre polynomials pk(t) = Ylj=i Qkj^~ X where the coefficients Q^j are defined as in (f5]) 
with a! = a and b' = b. Then according to ([3]), the corresponding orthogonal series estimator is given by 

^ m n 
j,k,l=l i=l 

In Example 1 (b) we considered the same functions g p but here Assumption [3] holds with T equal to the 
identity operator on H. This difference relies on the parametrization of the exponential family by the 
inverse of the scale parameter. 

3. Analysis of the Orthogonal Series Estimator 
In this section the properties of the orthogonal series estimator are analyzed. 

3.1. Bias, Variance and MISE. It is useful to write the orthogonal series estimator f m>n defined in (f3]> 
in matrix notation. Therefore, we introduce the m x m-matrix Q = (Qk,j)k,j, where Qk,j = for all 
j > k, and the m- vectors 

$ = [^ 1 ,...,i Pm } T , * = [ij 1 ,...,ip m f = Q$ , 



1 n 

j(aO = [gi(x),...,g m (x)] T , g = - VgpQ 



n 

i=l 

_ rs . c 1 T 



C = [Ci, . . . ,Cm] =(*,/)h, C = [c nj i, . . . ,C, 

It follows that the orthogonal series estimator can be written as 

f m ,n = c T ^> = g T Q T Q® . 
Further, let S = 7Tj(gg T ) — 7Ty(g)7ry(g) T be the covariance matrix of g(X±). The MISE is defined by 

2 

E /m, n — / ■ The orthogonal projection of / on V m is denoted by 

EI 

m 

Pv m f = C T V = ^Cn^k ■ 
k=l 
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It is clear that the orthogonal series estimator / m n is an unbiased estimator of Pv m f- Furthermore, 
by the usual argument, the MISE is decomposed into two terms representing the integrated variance and 
integrated squared bias, as summarized in the following result, whose proof is standard and thus omitted. 

Proposition 1. Suppose that Assumption\2\holds. The orthogonal series estimator f myn defined in (0 
satisfies 

(i) For every t G [a, b], E[/ m>n (t)] = Py m /(i). 

(ii) For every t G [a, b], Vax(/ m , n (i)) = ^ T (t)QEQ T ^>(t). 



(Hi) E 



fm,n f 



= \\Pv m f-f\g + ^ (QT,Q T ) . 



An important issue for orthogonal series estimators f m>n is the choice of the approximation order m. 
The integrated squared bias ||-fV m / — /He only depends on how well Py m f approximates /, whose rate 
of convergence depends on the smoothness class to which belongs the density /. To be more precise, 
define for any approximation rate index a and radius C, the approximation class 

(9) C(a, C) = {/ G H : ||/|| H < C and \\P Vm f - f\\ m < Cm~ a for all m > 1} . 



iii.il 



So when the mixing density / belongs to C(a, C), then the bias of the orthogonal series estimator /, 
is well controlled, namely it decreases at the rate m~ a as m increases. Furthermore, denote the set of 
densities in H by Hi = {/ G H : / > 0, J b f(t)dt = 1}. We will investigate the rate of convergence 
of f mjn in EI when / G C(a, C) n Hi. We will obtain the best achievable rate in the case of exponential 
mixtures and almost the best one in the case of Gamma shape mixtures. 



3.2. Upper Bound of the MISE. We now provide an upper bound of the MISE for the orthogonal series 
estimator based on Legendre polynomials, that is, when Assumption [3]holds. 

To show an upper bound of the MISE we use the following property (see Roueff & Rydenl. 20051 . 



Lemma A.l). If A > 2 |/+, b ' + J 1 + ^grz^ , then the coefficients of the normalized Legendre polyno- 
mials in L 2 [a', b'] defined by (fSJ) verify 

k 

(10) ^Ql l = 0(\ 2h ), asfc^oo. 

i=i 

By combining Proposition [Tl (ITiTT) and the bound given in (fTOl ) along with a normalization condition on 
the gfc's (Condition ([TTI) or Condition (fl4T) below), we obtain the following asymptotic upper bounds of 
the MISE. 

Theorem 1. Let a be a positive rate index and C be a positive radius. Suppose that Assumption\3\holds 
with f G C(a, C) n Hi. Let f m ^ n be defined by © with Legendre polynomials coefficients Q^j given 
by (O. Then the two following assertions hold. 

(a) If for some constants Cq > and B > 1, we have 

(11) V&r(g k (X)) < C B 2k for all k > 1 . 
Set m n = A log n with 




(12) A<^logB + log H+2L^ + ./ 1 + H+^ 
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Then, as n — > oo, 



(13) 



E 



/ 



<C'm- 2a (l + o(l)), 



where the o-term only depends on the constants a, C, a', b', A and Cq. 
(b) If, for some constants Co > and r/ > 0, we have 

(14) V&r(g k (X)) < C k r]k for all k>l. 

Set m n = A log nj log log n with A < rj~ l . Then, as n — >■ oo, 

2 



(15) 



E 



fn 



f 



<C'm- 2a (l + (l)), 



where the o-term only depends on the constants a, C, a', b', A and Co- 

Remark 1. The larger A, the lower the upper bound in (fT3T ). Hence, since a', b' and B directly depend on 
the <7fc's, the constraint (fl2l) on A indicates how appropriate the choice of the g^'s is. 

Remark 2. In the examples treated in this paper, Co and B or rj can be chosen independently of / £ 
C(a, C) fl Hi. Consequently, the bounds given in ([T3l and (fT4l show that f mn ,n achieves the MISE rates 
(log n)~ 2a and (log(n) / log log n)~ 2a , respectively, uniformly on / G C(a, C) n Hi. In the exponential 
mixture case, we show below that fm n ,n of Example 1(a) is minimax rate adaptive in these classes 
(since m n does not depend on a). In the Gamma shape mixture case, we could only show that fm n ,n of 
Example 2 is minimax rate adaptive in these classes up to the multiplicative log log n term. 

Proof. We first consider Case ([a]). By (fT2l . we may choose a number A strictly lying between 2 ^ ) , a _'^, b + 



1 + 2 ^ a la' b ' anci e 1 /^) /B. Note that from Condition CD]), it follows by the Cauchy-Schwarz inequal- 
ity that |Efci| = \Cov(g k (X),gt(X))\ < CoB k B l for all k, I. Thus, we obtain 



m m m 



tr (Q£Q T ) < E E \Qk,iQk,l\ B j B l 

k=l j=l 1=1 

m I k k 

k=l \j=l j=l 

< Km{BX} 2m , 

where the last inequality comes from (TTOb and K is a positive constant (the multiplicative term m is 
necessary only for B = 1). It follows by the decomposition of the MISE in Proposition [T1 (lull) that 



E 



L n ,n -f < C 2 m- 2a + Kn- l m n (B\) 



2m„ 



(BX) 



2m„ 



Now we have for m n = ^41ogn that n~ 1 m 2 l a+1 (BX) 2m " = A 2a+1 (log 
since A < 1/(2 log BX). 



n 



2a+l n 2A logSA-1 



oil), 



Let us now consider Case ©. Proceeding as above, for any A > 2 ^ ) , a J^, b ' + y 1 + 2 , we get 
tr (QT,Q T ) < KC Q X 2m m 1+ ^ m , which yields 



E 



fn 



f 



1 + ^2 n ~ 1?71 « a+1+,?m " A2mr 
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To conclude, it suffices to check that the log of the second term between parentheses tends to — oo as 
n — > oo for m n = A log nj log log n with A < r] -1 , which is easily done. □ 

Let us check the validity of Condition (fTTT) or Condition (fl4l for the above examples. 

Example 1 (a). Exponential Mixture ( continued). Condition (flTT ) immediately holds with B = Co = 1 
for the exponential mixture of Example 1(a) since gk(x) = 1 {x > k — |}. 

Example 1(b). Exponential Mixture (continued). Interestingly, Condition (fTTT) does not hold for Exam- 
ple 1(b), where a different choice of g^'s is proposed. In fact, one finds that log V&r(gk(X)) is of order 
k log(fc). Hence, only Condition (fl4l holds and we fall in case © of Theorem[TJ Since a slower rate is 
achieved in this case, this clearly advocates to choose the estimator obtained in Example 1(a) rather than 
the one in Example 1(b) for the exponential mixture model. 

Example 2. Gamma Shape Mixture (continued). Here we set = 2~2i=i ^fc where the coeffi- 
cients (£fc j) are those defined and computed in Lemma[6]of the Appendix. Using the bound given by (1441 
in the same lemma, we obtain that g k (x) < k\(l V \x\ k ' 1 ). It follows that ir t (gl) < (kl) 2 (l + T(t + 
2k - 2)/r(t)), and, for any / G Hi, tt/(^) < (kl) 2 (l + T(b + 2k - 2)/T(b)). Hence, by Stirling's 
formula, we find that Condition (fl4l holds for r] = 4 and some Co independent of / € Hi. 

Example 3. Scale Mixture of Beta Distributions or Uniform Distributions (continued). We now verify 
Condition (fTTT) for Beta mixtures and the g p of Example 3. Note that we can write X = 6Xq with 
independent random variables 9 ~ / and Xq ~ B(l, k). We have for all p > 1 

Va[9p[ )) -k 2 /3 2 (p,k)~ k 2 (3 2 (p,k) S k*P 2 (p,k)- 
Hence Condition CO} holds with B = k if b < 1, with B = bk if b > 1. 

A close inspection of Example 3 indicates that it is a particular case of the following more general 
result concerning mixtures of compactly supported scale families. 

Lemma 1. Suppose that Assumption [7] holds in the context of a scale mixture on R + , that is, £ is the 
Lebesgue measure on R + and iTt = t~~ 1 iri(t~ 1 -) for all t € = [a, b] C (0, oo). Assume in addition 
that 7Ti is compactly supported in R + . Define, for all k > 1, 



5fc(x) = (J x k 1 vr 1 (x)da;^ 



77je« Assumption\2\holds with tfk(t) = t k ~ 1 , and thus also does Assumption\3\with T being the identity 
operator on L 2 ([a, b]). Moreover there exists Co and B only depending on m and b such that Condi- 
tion (177]) holds. 



Proof. Using the assumptions on 7Ti and Jensen's inequality, we have 

B^ < J x m 7ri(x) dx < B% for all m > 1 , 

with B\ = J x 7Ti(x) dx and B2 > such that the support of 7Ti is included in [0, B2}. The result then 
follows from the same computations as in Examples 3. □ 

An immediate consequence of Theorem Q] and Lemma[T]is the following. 

Corollary 1. Under the assumptions of Lemma f/J the estimator f m ,n defined by (0 with Legendre 
polynomials coefficients Qk,j given by (0 achieves the MISE rate (logn)~ 2a uniformly on f G C(a, C)n 
Hi for any a > and C > 0. 
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4. Approximation Classes 

Although the approximation classes C(a, C) appear naturally when studying the bias of the orthogonal 
series estimator defined in (O, it is legitimate to ask whether such classes can be interpreted in a more 
intuitive way, say using a smoothness criterion. This section provides a positive answer to this question. 

4.1. Weighted Moduli of Smoothness. Let us recall the concept of weighted moduli of smoothness as 



introduced by iDitzian & Totikl (119871) for studying the rate of polynomial approximations. For a < b in 



R, / : [a, b] — > R, r G N* and h G R denote by A^(/, •) the symmetric difference of / of order r with 
step h, that is 



(16) x)=EQ (-tf/o* + (»' - r / 2 )^ • 

i=0 

with the convention that A^(/, x) = if x ± mh/2 ^ [a, 6]. Define the step- weight function ip on the 
bounded interval [a, 6] as (^(x) = ^JJx — a){b — x). Then for / : [a, b] — > R the weighted modulus of 
smoothness of / of order r and with the step- weight function ip in the L p ([a, b}) norm is defined as 

t) P = sup || AL, (.)(/,•) lip • 

0<h<t 

We recall an equivalence relation of the modulus of smoothness with the so-called K-functional, 
which is defined as 

(17) K r>(p (f,f) p = inf{||/ - h\\ p + t r \\<p r h^\\ p : G A.C. loc } , 

h 

where fefr- 1 ) G AC •loc means that h is r — 1 times differentiable and M r 1 ^ is absolutely continuous on 
every closed finite interval. If / G L p ([a, 6]), then 

(18) M-^if^pKK^if^KMu^f^p, fort < t , 
for some constants M and to, see Theorem 6.1.1. in lDitzian& Totikl dl987h . 



4.2. Equivalence Result. We show that the classes C(a,C) are equivalent to classes defined using 
weighted moduli of smoothness. This, in turn, will relate them to Sobolev and Holder classes. To make 
this precise, we define for constants a > and C > the following class of functions in H = L 2 ([a, b]) 

(19) C(a,C) = {f EM: ||/|| H < C and t) 2 < C7t Q for alU > 0} , 

where ip(x) = \/ {x — a)(b — x) and r = [a] + 1. 

The following theorem sta tes the equivalence of th e classes C(a, C) and C(a, C). This result is an 
extension of Proposition 7 in Roueff & Rydenl (2005) to the case where the subspaces V m correspond 



to transformed polynomial classes through an isometry T which includes both a multiplication and a 
composition with smooth functions. 

Theorem 2. Let a > 0. Suppose that Assumption\3\holds with a linear isometry T : M = L 2 ([a, b]) 
W = L 2 ([a',b']) given by Tg = a x g o r, where a is non-negative and [a] + 1 times continuously 
differentiable, and r is [a] + 1 times continuously differentiable with a non-vanishing first derivative. 
Then for any positive number a, there exist positive constants C\ and C2 such that for all C > 

(20) C{a, CiC) C C(a, C) C C(a, C 2 C) . 

where C(a, C) is defined in M 91) and C(a, C) is defined in ((9]) with approximation classes (V m ) given 
byfcB. 
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For short, we write C(a, •) "—^ C(a, •) when there exists C\ > such that the first inclusion in (l2Qb 
holds for all C > 0. The validity of both inclusions is denoted by the equivalence C(a, ■) x C(a, •) . 

Proof of Theorem^ Weighted moduli of smoothness are used to characterized the rate of polynomial ap- 
proximations. We start by relating C(a, C) to classes denned by the rate of polynomial approximations, 
namely 

C(a,C) = {g G H' : \\g\\ H , < C and inf \\g - p\\ H , < Cm~ a , for all m > 1} , 

pG"Pm-l 

where V m is the set of polynomials of degree at most m. Indeed, we see that, since T is a linear isometry, 
C(a, C) = {/eI: \\f\\ H < C and \\P Vm f - f\\ H < Cm~ a for all m > 1} 

= {T^g : g £ H', < C and ||P WmS - c/|| H , < Cm~ a for all m > 1} 

= T^Ci^C) . 

As stated in Corollary 7.25 in lDitzian & Totik ( 19871) . we have the equivalence C(a, ■) x C'(a, •), where 



C'(a, C) is defined as C(a, C) but with a' and b' replacing a and b. Hence, it only remains to show that 
(21) rt(Q,^C(a,.). 

To show this, we use the assumed particular form of T, that is T(g) = a x g o r. Since T is an isometry 
from H = L 2 ([a, b]) to H' = L 2 ([a', b']) and c is non-negative, we necessarily have that r is a bijection 
from [a', b'] to [a, b] (whose inverse bijection is denoted by r _1 ) and a = l/Vr' o r _1 . Moreover the 
inverse isometry writes T~ 1 (g) = (a o r" 1 )" 1 x jo r _1 . From the assumptions on r we have that er, 
(cjot^ 1 )" 1 , r and r" 1 all are [a] + 1 times continuously differentiable and the two latter's first derivative 
do not vanish. The equivalence (I2TT) then follows by Lemma [5] in the appendix. □ 

Example 1 (a). Exponential Mixture (continued). In Example 1(a) of continuous exponential mixtures, 
the operator T is given by 0, that is a(t) = 1 j^fi and r(i) = - log t and further W = L 2 (e~ b , e~ a ). 
Both a and r are infinitely continuously differentiable on [a, b] if a > 0, and thus the equivalence given 
in (|20]> holds. 

Example 1(b). Exponential Mixture (continued). For the estimator exhibited in Example 1(b) for expo- 
nential mixtures, the isometry T is such that a(t) = r(i) = 1/t with a' = 1/b and b' = 1/a. Hence, the 
conclusion of Theorem |2] holds if a > 0. 

Example 2 and 3. Gamma Shape Mixture and Scale Mixture of Beta Distributions (continued). In the 
case of a Gamma shape mixture given in Example 2 as well as in the case of a scale mixture of Beta 
distributions or uniform distributions considered in Example 3, the transform T is the identity and he nce 



Theorem |2] applies. However, this result is also obtained by Corollary 7.25 in lDitzian & Totikl ( 19871) . 



5. Lower Bound of the Minimax Risk 



Our goal in this section is to find a lower bound of the minimax risk 

inf sn pnf n \\f-f\\ 2 m , 

where S n is the set of all Borel functions from W 1 to H, C denotes a subset of densities in Hi and it® n 
denotes the joint distribution of the sample {X\, . . . , X n ) under AssumptionQ] We first provide a general 
lower bound, which is then used to investigate the minimax rate in the specific cases of exponential 
mixtures, Gamma shape mixtures and mixtures of compactly supported scale families. 
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5.1. A General Lower Bound for Mixture Densities. We now present a new lower bo und for the 
minimax risk of mixture density estimation. As in Proposition 2 in Roueff & Rydenl ( 2005 ). it relies on 
the mixture structure. However, in contrast with this previous result which only applies for mixtures 
of discrete distributions, we will use the following lower bound in the case of mixtures of exponential 
distributions, Gamma shape mixtures and scale mixtures of compactly supported densities. 

Theorem 3 (Lower bound). Let /q G Hi and /, ei with ||/*||h < 1 cmd /o ± /* G Hi the following 
lower bound holds, for any c G (0,1), 



1 



(22) inf sup 7rf n \\f - f\\& > c||/^ - ^ 1 + / \n f .(x)\ C(dx) 

where it® 11 denotes the joint distribution of the sample (X\, . . . ,X n ) under Assumption^ 



Proof. Let /* be as in the Proposition. For a fixed / G S n and any c G (0,1), define ^4 = {||/o — / ||h< 



j^}. Then, for all / G S n , sup /g{/o /o±/t} vr 



®n I 



/Hjg is bounded from below by 



o^/o+zJI/o + /* - /III + - /* - f\\m + (1 - c)n% n \\f - 



- 2 /<>+/* 



1a||/o + /.-/|| 

Note that for a function k defined on R n we have 



JS>n 



+ 2^-/- L 



1a||/o-A-/||| +(l-c)vrf;||/ -/|| 



/n n 
/c(xi, . . . , x n ) Y[ k/o fai) ± TT/* fai)] II C(dxj) 



ie-f jeJ 



nc(dxi) 



i=l 



where the sum is take over all sets / and J such that I U J = {1, . . . ,n} and / n J = 0. Therefore, 

vrf; +/ , [mii/o + /* - + *Z-f» [^ll/o - /* - /II 

« n 

= e / n^w n ^.(^ fn/o +/* - hi + (-i) #j n/o - /* - iwi] ic(d^) . 



J, J " iG/ 



j'eJ 



i=l 



Since ||/*||h < 1 and, on A, ||/ - /|| H < j^, we obtain that, on A, ||/ ± /* - /||h < ||/o - /||h + 
II/* lie < This implies that the absolute value of the sum in the last display taken over all sets / and 
J such that the cardinality of set J is positive, # J > 1, is lower than 



(1 - c) 2 ^ 

V ; I,J:#J>1 



/n 
11^^)111^^)111^) 
r^i iel jeJ i=i 



(1 _ c )2 

v ; I,J:#J>1 



e n /^(^(d^n /voiced 



(1 - C) 2 

Moreover, the term with # J = writes 



i+ / k/*(z)ic(d*; 



vr^ ( l A (||/o + /. - /Hi + ||/o - /. - /HI) ) = 2vrf; l A (||/o - /||i + IIMI H ) , 
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by the Parallelogram law. By combining these results, the minimax risk is bounded from below by 



(1-0)^11/0-/112+^ |l A (||/ - /||| + ||/,||2)j _____ ^ + J |^(_)| C (dx) 

Finally we see that 

(1 - c)||/o - /||| + ct A (||/o - /HI + ||/,|||) = c1U||/*||h + ((1 - c) + cl_)||/„ - /III 

> c1a||/*||| + c1 a c 
^ c II/*IIh j 

where we used 1 > ||/*|||. This yields the lower bound asserted in the theorem. □ 



5.2. Application to Polynomial Approximation Classes. The lower bound given in (1221 relies on the 
choice of a function /* such that /q and /o ± /* are in the smoothness class of interest. In this subsection, 



we give conditions which provide a tractab le choice of ||/, 



Following the same lines as Theorem 1 in iRoueff & Rydenl (120051) . the key idea consists in restricting 



< 1 for the class C(a, C) defined in _]>. 



our choice using the space (the orthogonal set of V m in H) and to control separately the two terms 
that appear in the right hand-side of (1221 within this space. 



An important constraint on /* is that /o ± /* € Hi. In particular, for controlling the sign of /o ± /*, 
we use the following semi-norm on H, 

ll/IU/o- ess sup Ml, 
tee Jo{t) 

with the convention 0/0 = and s/0 = oo for s > 0. Further, for any subspace V of H, we denote 

K^iy) = sup{||/|U, /o : / € V, II/Hh = 1} • 

It is similai - 



The following lemma will serve to opt imize the term ||/*||_ on the right-hand side of 
to Lemma 2 in Roueff & R yden d2005h . so we omit its proof. 



Lemma 2. Suppose that Assumption\2\holds. Let /o be in Hi, a, Cq > 0, K < 1 and let C(a, Co) be 
defined by (_ with V m given by (@). Let moreover w S H. Then there exists g G C(a, Co) n n 



\9\ 



min I Cq (m + 1) 



K 



Under Assumption _J where the orthonormal functions V'fc are related to polynomials in some space 
H' = L? [a', b'], the constant K OQ j (V m+ 2 fl V^ - n io ) can be bounded by K OQ j (V rn+ 2) and then using 
the following lemma. 

Lemma 3. Suppose that Assumption\3\holds. Let /o be in Hi and suppose that 



(23) 



sup 



3 j : / € H smc/i f/iaf sup |T/(t)| < 1 > < oo 



Then there exists a constant Cq > satisfying 

Kooj {V m+ 2) < C m , for all m > 1 . 
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Proof. Note that {Tf : / £ V m } is the set of polynomials in H' of degree at most m — 1, denoted by 
V m -\. Using ||/||h = II^/Hh' an d denoting by B the left-hand side of (l23T >. we have 



K OOtf0 (V m )=snp{\\f\\ ■■f€V„. 



1} 



<Bsup^ sup \Tf(t)\ : /GF m ,||/| 

[tG[a',b'] 

= 5sruW sup \p(t)\ : p eV m -i, 

[t£[a',b'] Ja' 



p 2 (t)dt = 1 



By the Nikolskii inequality (see e.g. IPeVore & Lorenta (11993b . Theorem 4.2.6), there exists a constant 
C > such that the latter sup is at most Cm. Hence, there exists Co > such that Kooj Q iy m ) < Com 
for all m > 1. □ 



Theorem [3] and Lemmas [2] and [3] yield the following result. 

Corollary 2. Let a > 1 and C > (b — a)" 1 / 2 . Suppose that Assumption\3\holds with an isometry T 
satisfying the assumptions of Theorem [2] Let w be an [a] + 1 f/wies continuously differentiable function 
defined on [a, b] and set 

(24) v m = sup / \ir wg (x)\ C(dx) . 

S6V^,||s|| H <l^ 

Then there exists a small enough C* > and C* > smc/i /or arcy sequence (m n ) of integers 
increasing to oo satisfying v mn < C* n~ 1 m~ a , we have 

(25) inf sup 7rf l ||/-/||^>C*m- aa (l + (l)) J 

f£S n /GC(Q,C)nHl 

where C(a, C) is the smoothness class defined by f !79D . 

Remark 3. The assumption C > (6 — a) -1 / 2 is necessary, otherwise C(a,C) D M\ is reduced to one 
density for C = (b — a)" 1 / 2 and is empty for C < (b— a)" 1 / 2 . To see why, observe that for any / £ Hi, 
by Jensen's inequality, \\f\\^ = f f 2 (t)dt > (b — a) -1 , with equality implying that / is the uniform 
density on [a, b}. 

Proof. We apply Theorem [3] with fo set as the uniform density on [a, b] and /* chosen as follows. For 
some Co > and an integer m to be determined later, we choose /* = wg where g is given by Lemma|2] 
with K = min(l, sup tG [ a fe ] Since g £ w ± and ||^||oo,/ — K, we get that fo ± /* e Hi. 

Now we show that {/o, fo ± /*} C C(a, C) for a well chosen Co- We have ||/o||h = (b — a)" 1 / 2 
and, since the symmetric differences of all order vanishes on fo, we get that fo £ C(a, (b — a)" 1 / 2 ). By 
definition of g in Lemma|2]and Lemma [5] successively, we get that /* £ C(a, C[Co) for some C[ > 
not depending on Co- Choosing Co = (C — (b — a) -1 / 2 )/C{, we finally get that 

{/ ,/o±/*}cc>,c)nH 1 . 

By Lemma |2j H^Hh - >-0asm— > oo and, since w is bounded, it implies that ||/*||e < 1 for m large 
enough. Hence we may apply Theorem [3] and, to conclude the proof, it remains to provide a lower 
bound of the right-hand side of (1221 for the above choice of Under the assumptions of Theorem |2j 
Condition (1231 clearly holds. So Lemma |3] and the definition of g in Lemma |2] give that 

\\g\\H < C'om~ a , 
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for some constant C' > 0. By definition of v m and since g G V^, we have 

\nf t {x)\ C(dx) < \\g\\HV m < C' m~ a v m . 

We now apply the lower bound given by (1221 with m = m n for (m n ) satisfying v rrln < C^n" 1 ?™"". We 
thus obtain 

inf sup Trf H/-/IH > c(C> m - a ) 2 --^aC>rn- 2a (l + o(l)) > C*m~ 2a (1 + (1)) , 

/G<5n /6C(a,C)nHi V 1 c ) 

where the last inequality holds for some C* > provided that C* is small enough. □ 

To apply Corollary 12 one needs to investigate the asymptotic behavior of the sequence (v m ) defined 
in (l24l) . The following lemma can be used to achieve this goal. 

Lemma 4. Under Assumption^ ifir.{x) G "El for all x G X, then v m defined in \24\ satisfies 



(26) 



< j \\T[w^.{x)}- P Vm _ x {T[w^{x)])\\ w C(dx) 



where V m -\ I s the set of polynomials of degree at most m — 1 in W and P-p ml denotes the orthogonal 
projection in H' onto V m -\- 

Proof. Let g G V~ such that \\g\\u < L Then we have, for all i£R, 

■K wg {x) = {wg,n.(x))n = {g,wir.(x)) m = (Tg, T[wir. ■ 

Recall that TV m = P m _i is the set of polynomials of degree at most m — 1 in H'. Hence, Tg is 
orthogonal to V m ~\, and for any p G V m -\, we get, for all 

(27) ktoflWI = |(Tt«,r[t£;7r.(a:)] - p)w\ < \\T[wir.(x)} - p\\w , 

where we used the Cauchy-Schwarz inequality and ||T^||h' = \\g\\a < 1- Now the bound given by (l26l ) 
is obtained by taking p equal to the projection of [wtt. (x)] onto V m -i (observe that the right-hand side 
of (|27T ) is then minimal). □ 

5.3. Minimax Rate for Exponential Mixtures. In this section, we show that in the case of exponential 
mixtures the orthogonal series estimator of Example 1(a) achieves the minimax rate. 

Theorem 4. Consider the exponential case, that is, let Assumption\l\hold with Q defined as the Lebesgue 
measure on K + , = [a, b] C (0, oo) and trt(x) = te~ tx . Let C > (b — a)" 1 / 2 and a > 1 and define 
C(a, C) as in < \19i . Then there exists C* > such that 

(28) inf sup 7Tf n \\f-f\\ 2 m >C*(logn)- 2a (l + o(l)) . 
fes„ / e c(a,c)nHi 

Proof. Let gk{x) = 1 {x > k — \\, for k > 1. Then Assumption [3] holds with Lp k and T defined by © 
and (0, respectively. Since a > 0, T satisfies the assumptions of Theorem |2] We may thus apply 
Corollary [2] with w = lu&i. Hence the minimax lower bound given in (1281 thus follows from (|25T ). 
provided that we have for some constant C > 0, setting m n = C log n, 

(29) f m „ = o(n~ 1 m~ a ) as n ->■ oo , 

where w m is defined by (l24l >. Note that 7r t (x) = te _xi lig + (x). We apply Lemma|4]to bound v rn . Using 
the definition of T in Q, we have for all x > 0, [T7r.(x)](i) = - logt t x ~ 1 / 2 . We write x G K + as the 
sum of its entire and decimal parts, x = [x] + (x), and observe that, since < x > —1/2 G [—1/2, 1/2) 
and [a',b'] = [e _b ,e _a ] C (0, 1), the expansion of t <x> ~ 1 / 2 = ^ fc>0 otk(x)(l — t) k as a power series 
about t = 1 satisfies |ajt(a;)| = n?=i \( x ) ~ 1/2) — i|/^' < 1- Extending — logt about t = 1, we thus 
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get -logtiJi^-Va = Ek>oMx)(l - t) k with \p k (x)\ = | Ei=i^k-l/l\ < 1 + log(fc). For any 
x < m, we use this expansion to approximate [Ttt. (x)] (t) = — \og(t)t <x> ~ 1 / 2 x iM by a polynomial 
of degree m. Namely, we obtain 

m—[x] 

sup \[Tir.(x)](t) - Pk(x)t k+[x] \< (l + log(k))(b') k+lx] <C lC m , 

k=0 k>m-[x] 

where we used the bound 1 + log(fc) < C\(c/b') k , valid for some constants C\ > and c G (b', 1) not 
depending on x. This bound also applies to \\Ttt. (x) — P-p-m-i {P^. ( x )) IIh' by definition of the projection 
P-p m l . For x > m, we simply observe that \[Tir. (x)](t)\ < — log(a')b' x ~ 1 ^ 2 . This also provides an 
upper bound for \\Tir.(x) — P-p m _ 1 (Tir.(x))\\-a>. Finally, integrating on x > we get 



/ \\Ttt.(x) - Pp^iTir.ixMw dx <C 2 mc n 

JM.+ 



with constants C2 > and c < 1 not depending on m, and this upper bound applies to v m by Lemma|4] 
This shows that d29l ) holds provided that C > is taken small enough. This completes the proof. □ 

5.4. Minimax Rate for Gamma Shape Mixtures. In this section, we show that in the case of Gamma 
shape mixtures the orthogonal series estimator of Example 4 achieves the minimax rate up to the log log n 
multiplicative term. 

Theorem 5. Consider the Gamma shape mixture case, that is, let Assumption\l\hold with Q defined as 
the Lebesgue measure on M. + , O = [a, b] C (0, 00) and TTt(x) = x t ~ 1 e~ x /T(t). Let C > (b — a)~ 1 / 2 
and a > 1 and define C(a, C) as in rti9D . Then there exists C* > such that 

(30) inf sup nf n \\f-f\\ 2 m >C*(\ogn)- 2a (l + o(l)). 

fes„ / e c(a,C)nH 1 

Proof. We proceed as in the proof of TheoremlU This time we set gk{x) = Yli=i ^k,it 11 with coeffi- 
cients (cj; 1) defined in Lemma [6] Assumption [3] then holds with H' = H and T defined as the identity 
operator. Applying Corollary |2] with w(t) = T(t), we obtain the lower bound given in (1301) provided 
that Condition (f29T > holds with m n = C log n/ log log n for some C > 0. Again we use Lemma @] to 
check this condition in the present case. To this end we must, for each x > 0, provide a polynomial 
approximation of w(t)irt(x) = x l ~ l e~ x as a function of t. Expanding the exponential function as a 
power series, we get 

'log(*)' fe 



sup 

te[a,b] 



k=0 



< e- E 



k\ 

k>m 



-c k 



where c = max(|o — 1|, \b — 1|). Let (x m ) be a sequence of real numbers tending to infinity. The right 
hand side of the previous display is less than e c ' lo §( a: )l _x (c| \og{x)\) m /m\. We use this for bounding 
||u;7r.(x) — P-p m _ 1 (wtt. (x))||h (recall that T is the identity and H' = H) when x G [e~ Xm ,x m ]. When 
x G (0,e -Xm ) we use that the latter is bounded by 0(1) and when x > x m by 0(e~ x / 2 ). Hence 
Lemma H] gives that 



v m = 0(e~ Xm ) + — - / e c ^ x ^ x I log(x)r n dx + O U~ x ^ 2 
Now observe that, as x m — > 00, separating the integral as J ' Xm + J?" 1 , we get 

1 

e c| tog(a)|-x J i g( x )|« dx = 0(e CXm x™) + 0(log m (2; m )) . 
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Set x m = com. By Stirling's formula, for cq > small enough, we get v Tl 
We conclude as in the proof of Theorem |4] 



<D(cf) with ci € (0, 1). 

□ 



5.5. Lower Bound for Compactly Supported Scale Families. We derived in Corollary Q] an upper 
bound of the minimax rate for estimating / in C(a, C). It is thus legitimate to investigate whether, as 
in the exponential mixture case, this upper bound is sharp for mixtures of compactly supported scale 
families. A direct application of Corollary |2] provides the following lower bound, which, unfortunately, 
is far from providing a complete and definite answer. 

Theorem 6. Consider the case of scale mixtures of a compactly supported density on R + , that is, suppose 
that the assumptions of Lemma \T\ hold. Suppose moreover that n\ has a k-th derivative bounded on M+. 
Let C > (b — a)~ l l 2 and a > 1, and define C(a, C) as in ([79]). Then ifk>a, 

(3D mf sup nf n \\?-f\\'$ l >n- 2o 'K k - < *\l + o{l)). 

feS„ / g C(a,C*)nHi 

Proof. We proceed as in the proof of Theorem |4j that is, we observe that Assumption [3] holds with the 
same choice of as in Lemma Q] and apply Corollary [2] with w = lr h. Here, the lower bound given 
in (1311 is obtained by showing that 

(32) v mn = 0(n _1 m~ a ) as n ->■ oo , 

holds with m n = jiVC* - ") and with (v m ) denned by (l24l> . Again we use Lemma|4]to bound v m . Here 
T is the identity operator on H = W and Tr t (x) = t~ l -K\{x /t). Let M > such that the support of m is 
included in [0, M]. Then for t S [a, b] and x > Mb, nt(x) = 0. Hence 



(33) 



\tt.(x) - Pv m -i^\ x ) 



for all x > Mb . 



We now consider the case x < Mb. By the assumption on m and a, we have that t i-)- Tr t (x) = 
t~ 1 -Ki(x/t) is A;-times differentiable on [a, b]. Moreover its k-th derivative is bounded by CkX k on [a, b], 
where Ck > does not depend on x. It follows that, for any h > and t £ [a, 6], 

A^7r.(x),t)| ^CfcA;!^)' 2 , 

where A| is the A;-th order symmetric difference operator defined by (fl6l ). Observing moreover that 



\lT.(x) 



r 2 Tr{{x/t) dt<C, 



for some C > not d epending on x, we get that -ir.(x) G C(/c, C" V Cfc A;! Using Corollary 7.25 in 
Ditzian & Totikl (119871) . we thus have for a constant C" > not depending on x, 

[ < C"{1 + x k )m~ k 



(34) \\ir,{x)-P Vm _^-K.{x)\ 

Applying Lemma |4] with (l33l and (l34l . we obtain v m 
m n = n 1 /^ - "), which completes the proof. 



for all x < Mb 



0(m~ k ). We conclude that (|32]> holds with 

□ 



Theorem [6] provides polynomial lower bounds of the minimax MISE rate whereas Corollary Q] gives 
logarithmic upper bounds in the same smoothness spaces. Hence the question of the minimax rate is left 
completely open in this case. Moreover the lower bound relies on smoothness conditions on m which 
rule out Example 3 (for which tt\ is discontinuous). On the other hand, the case of scale families can be 
related with the decon yolution prob l em that has r e ceived a considerable attention in a series of papers 
of the 1990's (see e.g. Izhand dl990h :lFanl dl991bl fl Il993h : iPenskv & Vidakovicl C99i). The following 
section sheds a light on this relationship. 
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5.6. Scale Families and Deconvolution. The f ollowing lo wer bound is obtained from classical lower 
bounds in the deconvolution problem, derived in Fan! (119931) . 



Theorem 7. Consider the case of scale mixtures on R+, that is, suppose that Assumption\T\with £ equal 
to the Lebesgue measure on R + , = [a, b] C (0, oo) and 7Tt(x) = t~ l Tt\(x/t). Denote by cp the 
characteristic function of the density e t iri(e t ) on R, 



H0 = J e* + ^7n(e*)dt. 



Define T as the operator T(g) = g, where g : R — > R and g(t) = t 1 g(\og{t)) for all t £ (0, oo). Let 
C > and a > 0, and define C(a, C) as the set containing all densities g onM such that 



/•)(£)- gW( u ) < C\t-u\ a ~ r forallt,u£ 



where r = [a] 



(a) Assume that 4>^\t) = 0(\t\ P J ) as \ t\ — > oo for j = 0, 1, 2, where <j)^ is the j-th derivative of (p. 
Then there exists C* > such that 

-2a/(2(a+/3)+l) ^ + /j^ _ 



(35) 



inf sup ^f n \\f-f\\^>C*n 

feS n f(zf(C(a,C)) 



(b) Assume that </>(t) = OQt^e^^) as \t\ — > oo for some /3, 7 > /3i, and that 7Ti(i 
o(n _1 | log(u)| _a ) as u — > 0, 00 for some a > 1. 77zerc ?/?ere erots C* > ^mc/z ?/za? 

Trf ||/-/|||>C7* log(n)- 2 ^(l + (l)). 



(36) 



inf sup 

feS n fef(C(a,C)) 



Proof. In the scale mixture case the observation X can be represented as X = BY, where Y and 9 
are independent variables having density tt\ and (unknown) density /, respectively. By taking the log 
of the observations, the problem of estimating the density of log(fl), t hat is f*(t) = e t f(e t ), is a de- 
convolution problem. Hence we may apply Theorem 2 in Fan (1993) to obtain lower bounds on the 
nonparametric estimation of /* from log(Xi), . . . , log(X n ) under appropriate assumptions on <f>, which 
is the characteristic function of log(y). Let a' = log(a) an d b' = log (b). The lower bounds in (a) and 
(b) above are those appearing in (a) and (b) in Theorem 2 of lFan ( 19931) of the minimax quadratic risk in 
HP = L 2 ([a', b']) for estimating /* in the Lipschitz smoothness class C(a, C). Observe that T is defined 
for all function g : [a', b'] — > R by T(g) = g with g defined on [a, b] by g(t) = t~ 1 g(log(t)), so that 
T(f*) = /. Observing that T is a linear operator and that for any g G W, \\T(g)\\^ x II^IIhs we obtain 
the lower bounds given in (1331 ) and (I36T ). □ 

As in Theorem |6j the smoother tt\ is assumed, the slower the lower bound of the minimax rate. 
However the lower bounds obtained in Theorem [7J hold for a much larger class of scale families. Indeed, 
if 7Ti is compactly supported, the condition induced on tt\ in case (a) are much weaker than in Theorem[6] 
For instance, it holds with (3 = k for Example 3. For an infinitely differentiable tt\ both theorems say 
that the minimax rate is slower than any polynomial rate. However, in this case, case (b) in Theorem [7J 
may provide a more precise logarithmic lower bound. It is interesting to note that, as a consequence of 
Beurling & Malliavinl (11962b . the MISE rate (log n)~ 2a , which is the rate obtained in Corollary Q] by the 
polynomial estimator for any compactly supported tt\, is the slowest possible minimax rate obtained in 
Theorem Hb) for a compactly supported n\. Such a comparison should be regarded with care since the 
smoothness class in the latter theorem is different and cannot be compared to the smoothness classes 
considered in the previous results, as we explain hereafter. 

The arguments for adapting the lower bounds of Theorem [7] also apply for minimax upper bounds. 
More precisely, using the kernel estimators for the deconvolution problem from the observations log(Xi), 
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log(X n ) and mapping the estimator through T, one obtains an estimator of / achieving the same inte- 
grated quad ra tic risk . The obtained rates depend on similar assumptions on <p as those in (a) and (b), see 
Fanl(ll991bllaL 1993 ). Although the scale mixture and the convolution model are related to one another by 



taking the exponential (or the logarithm in the reverse sense) of the observations, it is important to note 
that, except for Theorem [TJ our results are of different nature. Indeed, the upper and lower bounds in the 
deconvolution problem cannot be compared with those obtained previously in the paper because there 
are no possible inclusions between the smoothness classes considered in the deconvolution problem and 
those defined by polynomial approximations. 

Let us examine more closely the smoothness class T(£(a, C)) that appears in the lower bounds of 
Theorem |7J inherited from the results on the deconvolution problem. This class contains densities with 
non-compact supports whereas C(a,C) n Hi only contains densities with supports in [a, b). Hence 
neither (1351 ) nor (l36l) c an be used for deriving minimax rates in C(a,C) n Hi. In fact the densities 



exhibited in lFanl(|l9 93) to prove the lower bound have infinite support by construction and the argument 



does not at all seem to be adaptable for a class of compactly supported densities. As for upper bounds 
in the deconvolution problem, they are based on Lipschitz or Sobolev type of smoothness conditions 
which are not compatible with compactly supported densities on [a, b] except for those that are smoothly 
decreasing close to the end points. This follows from the fact that, in the deconvolution problem, standard 
estimators (kernel or wavelet) highly rely on the Fourier behavior both of the mixing density and of the 
additive noise density. In contrast, such boundary constraints are not necessary for densities in C(a, C). 
For instance the uniform density on [a, b] belongs to C(a, C) for all a > and C > (b — a) -1 / 2 , but has 
a Fourier transform decreasing very slowly. A natural conclusion of this observation is that polynomial 
estimators should be used preferably to standard deconvolution estimators when the mixing density has 
a known compact support [a, b] C (0, oo). Of course this conclusion holds for both deconvolution and 
scale mixture problems. 



Appendix A. Technical Results 

Lemma 5. Let a > 0, a < b and a' < b'. Define C(a, C)h as in < \19i and C'(a, C)h similarly with a' 
and b' replacing a and b. Let a be [a] + 1 differentiable on [a,b] and r : [a', b'] — > [a,b] be [a] + 1 
differentiable on [a' , b'] with a non-vanishing first derivative. Then 

iof : / G C(cv)h} ^C(a,-)n and {/ o r : / G C(a, -) H } ^ C'(a, -)h • 

Proof. As the first embedding is the inclusion (40) in lRoueff & Rvdenl J2005h . we only show the second 
embedding. Let / G C(a, C) and denote r = [a] + 1. Let t G (0, 1]. By the equivalence (fT8l) with the 
A'-functional given in (fTTT ) there exists a function h such that /i( r_1 ) G ACi oc and 

(37) ||/ - h\\ H +t r y r h^\\ m < 2MuUf,t) u < 2MCt a , 



where <p(x) = yj{x — a)(b — x). Let us set h = h o r and show that, for some constant K > neither 
depending on t nor C, 

(38) \\foT-h\\ m ,+t r \\ph^\y<KCt a , 



where we defined <p(x) = y{x — a')(b' — x), that is the same definition as ip with a' and b' replacing a 
and b. Using again equivalence (PT8T ). the bound given in (1381) will achieve the proof of the lemma. 

Note that since r' does not vanish, denoting C\ = (inf It')) -1 , for all g G H, we have 
(39) \\g ot\\ h > < Ci\\g\\ H ■ 



20 TABEA REBAFKA, FRANCOIS ROUEFF 

In particular we have that 

(40) ||/ o T - h\\ w = \\(f - h) o r||e' < Ci||/||h . 

Since r is r times continuously differentiable and h^'~^ G A.C.i oc , we note that h^'^ 1 ^ G AC.i oc with 
^ = Sj=i T j x ° r ' where the tj's are continuous functions only depending on r. Hence there is 
a constant C2 > only depending on r and r such that 

\\<f r h^\\w < C 2 .max \\(p r ° T H' • 
3=1,-,'' 

Another simple consequence of r' not vanishing on [a', 6'] is that there exists a constant C3 > such 
that (p{x) < C^tp o t(x) for all x G [a', 6']. Using this with (1391) in the previous display, we get 

(41) H^ 7 " ^ (r) ||iHi' < Ci C 2 C 3 max h^\\ m • 

3=1:— .*■ 

We shall prove that ||(// /i^||h appearing in the right-hand side of the previous inequality is in fact 
maximized, up to multiplicative and additive constants, at j = r. For j = 1, ... ,r — 1, we proceed 
recursively as follows. For any u G (a, b), we have 



\h^(x)\ < 

Then, by Jensen's inequality, 



+ \h^(u)\ . 



1/2 



II^^IIh < I / V 2r (x) [\x-u\ / {h^ +1 \s)} 2 ds )dx\ + H^lk \h®(u)\ , 

I J x=a \ Js£[ii,x] / 

where we used the convention that [c, d] denotes the same segment whether c < d or not. By Fubini's 
theorem, the term between braces reads 



{/i (j+1) (s)} 2 ^(s;u)ds with ip(s;u) = / l[ U)X ](s) (x - a) r (b - x) r \x - u\ dx . 



b 



Let a < b be two fixed numbers in (a, 6). It is straightforward to show that, for some constant C4 > 
only depending on a, b, a, b, we have 

ip{s; u) < C\ ip 2r {s) for all u G (a, 6) . 

The last 3 displays thus give that 

H^^Hh < C 4 ||<// i (j+1) || H + W\W inf. \h^(u)\ . 

u£[a,b] 

By induction on j, we thus get with (l4TT > that there is a constant C5 such that 

(42) n<^ h^\y < c 5 f y r h^\\m + E inf - I^WI 



3=1:—:''' 



1 u£[a,b] 



The final step of the proof consists in bounding inf ue j- * for j = 1, . . . , r — 1. Let Sj = 

inf ng j-gj \h^\u)\. Then for any v,v' G [5,6], we have \hy~^'(v') — > <5j |i/ — u|. Suppose 

that v is in the first third part of the segment [5, b] and v' in the last third so that \v — v'\ > (6 — 5)/3. 
On the other hand ^-^(V) - h^~^(v)\ < |^ _1 )(t/)| + It follows that l^' -1 )^')! and 

I /j( J_1 ) («) I cannot be both less than 5j (6—5)/ 3, which provides a lower bound of | W -1 ' | on at least one 
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sub-interval of [a, b] of length (6 — a)/3. Proceeding recursively we get that there exists a sub-interval 
of [a, b] on which h is lower bounded by 5j multiplied by some constant. This in turns gives that 

£ inf. \h®(u)\<C 6 \\h\\ m . 

j=l r -l «e[o,6] 

where Cq is a constant only depending on a, b and r. Observe that, since / E C(a,C), we have 
\\f\\m < C. Using (EJ, t £ (0, 1] and \\h\\ m <\\f- h\\ m + ||/|| H in the last display we thus get 

inf. |/i (i) (u)| < C 6 (2M + 1)C . 

Finally, this bound, (l42l ). (1401 ) and (1371 ) yields (|38T > and the proof is achieved. □ 

Lemma 6. Let (pk) be the sequence of polynomials defined by pi(t) = 1, pi(t) = t, Pk(t) = 
t(t + 1) . . . (t + k — 2) /or a/Z k > 2. Define the coefficients (ckj)i<i<k by the expansion formula 
jk-i _ Y^ =1 CkjPi(t), valid for k = 1, 2, ... . Then c\ \ = 1, and for all k > 2, 

(43) c fc ,i = , c fcjfc = 1 and c k ,i = Ck-tj-t - (I - l)c k -i,l far all I = 2, . . . , k - 1 . 
Moreover, we have, for all k > 1, 

k 

(44) ]T|5mI<^- 

i=i 

Proof. By definition of pi, we have tpi(t) = pi + \{t) — (I — l)pi(t) for any / > 1. Hence, for any k > 2, 
writing t k ~ l = tt k ~ 2 = J2i Ck-i,itpi{t), we obtain (1431 . 

We now prove (1441 . It is obviously true for k = 1. From (l43l . it follows that, for all k > 2, 

k k-1 

\ck,i\ < y^^icfe-i,;i + 1 ■ 
i=i i=i 

Bounding / inside the last sum by (k — 1) yields (1441 . □ 
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